Data File Compression (Lossy & Lossless)

Data Storage and File Compression (Lossy & Lossless)

Objectives : Student should be able to -

demonstrate how data storage is measured.
calculate the file size of an image and sound file.
identify and describe the purpose of and need for data compression.
describe how "Lossy compression" reduces the file size.
describe how "Lossless compression" reduces the file size.
give advantages and disadvantages of lossy and lossless compression.

Measurement of data storage

Q1. Describe the following Basic Units of measuring computer memory storage.

⇒ Bit stands for binary digit. The basic unit of data storage in computer memory.

⇒ It could be either 1 or 0, that represent ON or OFF state of electrical signal, which is the only thing a computer can understand.

(ii) Byte :

⇒ A group of 8 bits is called Byte.

⇒ Byte is the smallest unit of memory in a computer.

⇒ Processors are all built to work with a set length of bits, which is usually a multiple of byte, like 8, 16, 32, 64 , etc.

(iii) Nibble :

⇒ A group of 4 bits is called Nibble, which is equal to half a byte.

⇒ A Nibble could represent 2⁴ = 16 possible values. Hence, each hexadecimal digit is represented by a nibble.

Q2. Give different units of measuring computer memory in terms of bits and bytes, adopted by the IEC (International Electrotechnical Commission) that is based on the binary system.

Unit of Memory	Bits / Bytes	Bytes	General Name
1 Nibble	2² = 4 bits	- - - - -	Nibble
1 Byte	2³ = 8 bits	1 Byte	Byte
1 KiB (Kibibyte)	2¹⁰ Bytes	1024 Bytes	Kilo Byte
1 MiB (Mebibyte)	2²⁰ Bytes	1024 KiB	Mega Byte
1 GiB (Gibibyte)	2³⁰ Bytes	1024 MiB	Giga Byte
1 TiB (Tebibyte)	2⁴⁰ Bytes	1024 GiB	Tera Byte
1 PiB (Pebibyte)	2⁵⁰ Bytes	1024 TiB	Peta Byte
1 EiB (Exbibyte)	2⁶⁰ Bytes	1024 PiB	Exa Byte
1 ZiB (Zebibyte)	2⁷⁰ Bytes	1024 EiB	Zetta Byte
1 YiB (Yobibyte)	2⁸⁰ Bytes	1024 ZiB	Yotta Byte

Note : The International System of Units (SI) used around the world adopts the decimal system of measurement (in terms of powers of 10).
In SI units, 1 Kilobyte (KB) = 1000 Bytes, 1 Megabyte (MB) = 1000 KB, 1 Gigabyte (GB) = 1000 MB, 1 Terabyte (TB) = 1000 GB, which is technically inaccurate because computer memory size is actually measured in terms of powers of 2.

Calculation of file size

To calculate the file size of a bitmap image :

Size of image file (in bits)	=	Image resolution (in pixels) x Colour depth (in bits)

	=	Width (in pixels) x Height (in pixels) x Colour depth (in bits)

To calculate the file size of a Mono-sound :

Size of sound file (in bits)

Sample rate (in Hz) x Sample resolution (in bits)
x Length of sound (in seconds)

To calculate the file size of a Stereo-sound (2 channels of audio) :

Stereo is the sound recorded with two microphones placed in strategically chosen locations relative to the sound source, and played back through two channels (speakers). The two simultaneously recorded channels will be similar, but each will have distinct time-of-arrival and sound-pressure-level information.

Size of sterio-sound file
(in bits)

Sample rate (in Hz) x Sample resolution (in bits)
x Length of sound (in seconds) x 2 (number of channels)

To convert the size of file from bits to bytes , kilobytes , megabytes , etc. :

Size of file (in bytes)	=	File size in bits	bytes
Since, 1 byte = 8 bits		8

Size of file (in Kibibytes)	=	File size in bits	KiB
Since, 1 KiB = 1024 bytes		8 x 1024

Size of file (in Mebibytes)	=	File size in bits	MiB
Since, 1 MiB = 1024 KB		8 x 1024 x 1024

Q3. a) Calculate the file size of an image with 8 colours, captured at a resolution of 512 x 300 pixel.

8 Colour = 2³ Colour

Hence, Colour depth = 3 bits. (i.e. number of bits needed to represent 8 colours)

Size of image file (in bits)	=	Image resolution x colour depth

	=	512 x 300 x 3 bits

Size of image file (in bytes)	=	512 x 300 x 3	= 57600 bytes
Divide by 8, since 1 byte = 8 bits		8

Size of image file (in KiB)	=	512 x 300 x 3		(Or)	512 x 300 x 3	approximately
since 1 KiB = 1024 bytes		8 x 1024		(Or)	8 x 1000	approximately
(1000 bytes approximately)
	=	56.25 KiB		(Or)	57.6 KB	approximately

b) What would be the file size of the image if it is converted into Black and White.

Black and White means 2 colour = 2¹ Colour

Hence, Colour depth = 1 bits. (i.e. number of bits needed to represent 2 colours)

Size of image file (in bits)	=	Image resolution x colour depth

	=	512 x 300 x 1 bits

Size of image file (in bytes)	=	512 x 300 x 1	= 19200 bytes
Divide by 8, since 1 byte = 8 bits		8

Size of image file (in KiB)	=	512 x 300 x 1		(Or)	512 x 300 x 1	approximately
since 1 KiB = 1024 bytes		8 x 1024		(Or)	8 x 1000	approximately
(1000 bytes approximately)
	=	18.75 KiB		(Or)	19.2 KB	approximately

Q4. A camera detector has an array of 1920 by 1536 pixels. A colour depth of 16 bits is used.

Calculate the size of a photograph taken by this camera, giving your answer in MiB.

Size of image file (in bits)	=	Image resolution x colour depth

	=	1920 x 1536 x 16 bits

Size of image file (in bytes)	=	1920 x 1536 x 16	= 5898240 bytes
Divide by 8, since 1 byte = 8 bits		8

Size of image file (in MiB)	=	1920 x 1536 x 16
since 1 MiB = 1024 x 1024 bytes		8 x 1024 x 1024

	=	5.625 MiB

Q5. Photographs have been taken by a smartphone which uses a detector with a 1024 x 1536 pixel array. The software uses a colour depth of 24 bits.

How many photographs could be stored on a 16 GiB memory card?

Size of each photo (in bits)	=	Image resolution x colour depth

	=	1024 x 1536 x 24 bits

Size of each photo (in bytes)	=	1024 x 1536 x 24	bytes
Divide by 8, since 1 byte = 8 bits		8

16 GiB (in bytes)	=	16 x 1024 x 1024 x 1024 bytes

Number of photos on 16 GiB	=	16 GiB memory in bytes
		Size of one photo in bytes

Number of photos on 16 GiB	=	16 x 1024 x 1024 x 1024 x 8		= 3640.89
		1024 x 1536 x 24

	=	3640 Photos

Q6. A five minute audio is sampled at 44.1kHz per second with a 16 bit resolution.

Calculate the bit rate and size of the audio file.

Bit rate (in bps)	=	Sample rate (Hz) x Sample resolution

Bit rate (in kbps)	=	44100 x 16	= 705.6 kbps
Since 1 kbps = 1000 bits/sec		1000

File size (in bits)	=	Sample rate (Hz) x Sample resolution x Length of sound (in Sec)

File size (in MiB)	=	44100 x 16 x 5 x 60		= 25.23 MiB
		8 x 1024 x 1024

Q7. A 30 second audio is being sampled at the rate of 44.1kHz using 8-bits. Two channels are being used to allow for stereo recording.

Calculate :

a) the size of one sample, in bits.

Size of one sample (in bps)	=	Sample rate (Hz) x Sample resolution x Length of audio (in Sec)

	=	44100 x 8 x 30 = 10584000 bits.

b) the size of audio recording in MiB.

File size of stereo recording (in bps)	=	Size of one sample (in bits) x 2 Channels

	=	10584000 x 2 = 21168000 bits

File size of stereo recording (in MiB)	=	21168000	=	2.52 MiB
		8 x 1024 x 1024

Q8. The typical song stored on a music CD is 3 minutes and 30 seconds. Assuming each song is sampled at 44.1 kHz and 16 bits are used per sample. Each song utilises two channels.

Calculate how many typical songs could be stored on a 740 MiB CD.

Size of one sample (in bps)	=	Sample rate (Hz) x Sample resolution x Length of audio (in Sec)

	=	44100 x 16 x (3 x 60 + 30)

	=	44100 x 16 x 210 = 148176000 bits.

File size of stereo (in bps)	=	Size of one sample (in bits) x 2 Channels

	=	148176000 x 2 = 296352000 bits

File size of stereo (in bytes)	=	296352000	=	37044000 bytes
		8
Number of songs on 740 MiB	=	740 MiB (in bytes)
		Size of a sterio (in bytes)

	=	740 x1024 x 1024	=	20.95
		37044000

	=	20 songs could be stored on a 740 MiB CD.

Data Compression

Q9. a) State what is meant by Data / File Compression

File Compression is the process of encoding data more efficiently to achieve a reduction in file size.
It helps to reduce resource usage, such as data storage space or transmission capacity.
Compressed data has to be decompressed while using.
Compression can be either Lossy or Lossless.

b) Why is it necessary to compress files?

It helps to store more data in less storage space.
It helps to transmit large amount of data with fewer bits.
It minimizing the download and upload time.

Q10. a) Describe how Lossless compression reduces the file size.

Lossless compression reduces file size by identifying and replacing repeated data bits or words with shorter codes.
No information is lost in lossless compression.
The original data is completely recovered when decompressed.
It is used to compress files which can not afford any loss of data, like documents, database, program codes, etc.

b) Explain how program file, text or document files could be compressed.

Program files uses lossless compression to reduce its size, by storing repeated words in a table and replaces it with its index or numerical value.
It identifies and replaces the repeated phrases with shorter codes generated by complex algorithm.
The original data is completely recovered when the file is uncompressed before using.

Q11. Explain how the sentence below would be stored with a reduction of about 40% (ignoring spaces).

“COMPARE TEXT FILES IN A COMPUTER AFTER FILE COMPRESSION”

Store the repeated words (combination of characters) in a table with index number, like 1 – COMP, 2 – FILE, 3 – TER
Replace the words with its index number to transform the sentence into –

“1ARE TEXT 2S IN A 1U3 AF3 2 1RESSION”

This gives a reduction from 47 to 28 characters which is about 40%.

Q12. a) Describe how Run Length Encoding (RLE) algorithm is used to reduce the file size.

⇒ Run Length Encoding is a lossless data compression algorithm.

⇒ It reduces the size of a string with consecutive identical data (e.g. repeated character of text or pixels of an image).

⇒ A repeating characters in the string is encoded with two values :

the first value represents the number of repetitions or identical data items in the run.

the second value represents the code of the repeated data item (ASCII code for characters, Pixel detail for an image)

⇒ RLE is only effective where there is a long run of repeated units of data.

b) Explain how the size of the following text string could be reduced using Run Length Encoding (RLE) algorithm.

'a a a a a b b b b c c d d d d d'

This string contains 16 characters. If each character requires 1 byte of memory, then this string needs 16 bytes.
Using ASCII code, this string can be coded as follows :

5 97 4 98 2 99 5 100

Where, in each pair, the first values 5, 4, 2 and 5 are number of runs (repetition) and 97, 98, 99 and 100 are the ASCII code for each repeating characters.

Assuming each number requires 1-byte of memory, the RLE code will need 8 bytes (4 + 4). This is half the original file size.

c) Explain how the size of the following text string could be reduced using Run Length Encoding (RLE) algorithm.

'a a a a a a a a b b b b b b b b b b c d c d c d c c c c c c c c'

This string contains 32 characters. If each character requires 1 byte of memory, then this string needs 32 bytes.
A flag value of 255 is used to indicate the repetition of characters, followed by a pair of value, first value are number of runs and second value is the ASCII code of the character.
When flag is not used, the characters are given only its ASCII code.
Using this algorithm, this string can be coded as follows :

255 8 97 255 10 98 99 100 99 100 99 100 255 8 99

Assuming each number requires 1-byte of memory, the RLE code now requires 15 bytes.
This gives a reduction in file size of about 53% when compared to the original string.

Q13. a) Describe how the following Black and White image could use Run Length Encoding (RLE) algorithm to reduce its size without loosing its quality.

The resolution of this image is 8 x 8 = 64 pixels. Assuming that each pixel requires 1 byte of storage, the file size would be 64 bytes.
Using RLE, this image could be coded as follows :

9W 6B 2W 1B 7W 1B 7W 5B 3W 1B 7W 1B 7W 1B 6W

Representing white pixel W as 1 and black pixel B as 0, we get -

91 60 21 10 71 10 71 50 31 10 71 10 71 10 61

The compressed RLE code has 30 values and therefore needs only 30 bytes to store the image.

b) Describe how the following Coloured image could use Run Length Encoding (RLE) algorithm to reduce its size without loosing its quality.

The resolution of this image is 8 x 8 = 64 pixels with four colours (black, white, red and green). Assuming that each pixel is made up of RGB colours (a mixture of three colours red, green and blue), hence each pixel would require 3 bytes of storage and the file size would be 8 x 8 x 3 = 192 bytes.
The four colours of image could be coded as -

Pixel colour	Red	Green	Blue
	0	0	0
	255	255	255
	0	255	0
	255	0	0

Using RLE, this image could be coded as follows :

2 0 0 0 4 0 255 0 3 0 0 0 6 255 255 255 1 0 0 0 2 0 255 0 4 255 0 0 4 0 255 0

1 255 255 255 2 255 0 0 1 255 255 255 4 0 255 0 4 255 0 0 4 0 255 0 4 255 255 255

2 0 255 0 1 0 0 0 2 255 255 255 2 255 0 0 2 255 255 255 3 0 0 0 4 0 255 0 2 0 0 0

The compressed RLE code now has 92 values and therefore needs only 92 bytes to store the image. This gives a file reduction of about 52%.

Q14. a) Describe how Lossy compression reduces the file size.

Reduces the file size by permanently deleting the duplicate data without which the file could solve its purpose.
It permanently deletes the data which human cannot interprets.
It decides which parts of file need to be retained and which parts can be discarded.
It is not possible to regain the original file after compression.
It is generally used to compress audio, video and image files.

b) Describe how lossy compression is used to reduce an Image file size.

Reduces the file size by reducing the image resolution, permanently deleting the duplicate pixels without which the file could solve its purpose (or by reducing the dimension of the image or number of pixels along length and width of image).
By reducing the colour depth that reduces the number of bits needed to represent each pixel.
As the compression increases, the quality of an image decreases.
It is not possible to regain the original file after compression.

c) Describe how lossy compression is used to reduce a Sound file size.

Reduces the file size by reducing the sound resolution (or bit-depth), the number of bits needed to represent each sample.
By reducing the sample rate, the number of samples recorded per second.
As the compression increases, the quality of an sound decreases.
It is not possible to regain the original file after compression.

d) Describe how lossy compression is used to reduce a Video file size.

Reduces the file size by reducing the sound resolution and sample rate of the audio.
Reduces the pixel dimension and colour depth of all the frames.
Reduces the frame rate of the video.
Once compressed, the file cannot be regained to its original form.

e) Give reason why to choose lossy over lossless compression to compress image or audio files.

It produces a much smaller compressed file than lossless method, that still meets the purpose of the file even by permanently deleting some data.

Q15. Explain how MP3 lossy compression algorithm reduces the audio file size retaining most of the original music quality.

⇒ MP3 file uses lossy compression that reduces its size by about 90%, by permanently removing the sounds that human ear cannot hear.

⇒ It deletes the background noise, retaining only the loud clear sound called perceptual music shaping.

⇒ MP3 algorithm further compresses the file using lossless compression by replacing repeated bits with shorter codes maintaining its quality.

Q16. Explain how MP4 lossy compression algorithm reduces the video file size retaining most of its original quality.

⇒ MP4 file uses lossy compression that reduces its size, by removing the pixel informations like colour shades and brightness variations which human eyes cannot interpret from its video frames.

⇒ It deletes the background noise and the sound which human ear cannot hear.

⇒ Only stores the data that have changed from one frame to the next.

⇒ Hence, the removed data will not affect the quality of the video.

Q17. Explain how JPEG lossy compression algorithm reduces an Image file size retaining most of its original quality.

⇒ JPEG file uses lossy compression that reduces its size, by removing the pixel informations like colour shades and brightness variations which human eyes cannot interpret.

⇒ It is done by separating pixel colour from its brightness, which then allows certain information to be discarded from the image without loosing any noticable image quality.

⇒ JPEG file cannot be reversed to regain its original bitmap image raw data.

Q18. a) What are Uncompressed Image file formats.

⇒ RAW images are images that are unprocessed and uncompressed that have been created by a camera or scanner. There are a lot of different raw formats (like, .raw, .cr2, .nef, .orf, .sr2, and more), each camera company often has its own proprietary format.

b) Give two Lossless compression Image file formats.

BMP (file types ending in .bmp) :
BMP stands for Bitmap Image. They are uncompressed, designed to store raw device independent bitmap images.
TIFF (file types ending in .tif) :
TIFF stands for Tagged Image File Format. They are uncompressed and thus contain a lot of detailed image data, commonly used in medical imaging and in photo software.
PNG (file types ending in .png) :
PNG stands for Portable Network Graphics. They are lossless compressed, commonly used to store web graphics, digital photographs, and images with transparent backgrounds.
GIF (file types ending in .gif) :
GIF stands for Graphic Interchange Format. It has limited 256 color range, supports transparent background, used for animations. It uses lossless compresssion suitable for the web.

c) Give two Lossy compression Image file formats.

JPEG (file types ending in .jpg) :
JPEG stands for Joint Photographic Experts Group. They are compressed files using lossy algorithm, mostly used by digital cameras to store more photos in its memory cards and used for photographs on web pages.
AVIF (file types ending in .avif) :
AVIF stands for AV1 Image File format. It is the latest image file format that offers excellent lossy compression algorithm to achieve 50% smaller images size than JPEG images.

Q19. Give two difference between Lossless compression and Lossy compression.

Lossless compression

Lossy compression

Reduces the file size by replacing repeated data with shorter codes. Reduces the file size by permanently deleting the data without which the file could solve its purpose.

File can be decompressed to its original state without losing any data. File cannot be regained to its original form one it is compressed.

REVISION : Statements and its key computing terms.

Reduction of the size of a file by removing repeated or redundant pieces of data; this can be lossy or lossless -	Compression
The maximum rate of transfer of data across a network, measured in kilobits per second (Kbps) or megabits (Mbps) -	Bandwidth
A file compression method that allows the original file can be fully restored during the decompression process, for example, run length encoding (RLE) -	Lossless file compression
A method used to reduce the size of a sound file using perceptual music shaping -	Audio compression
A lossy file compression method used for music files -	MP3
A lossy file compression method used for multimedia files -	MP4
From Joint Photographic Expert Group; a form of lossy file compression used with image files which relies on the inability of the human eye to distinguish certain colour changes and hues -	JPEG
A lossless file compression technique used to reduce the size of text and photo files in particular -	Run length encoding (RLE)

Number of views : 950

* * * * * * * * *
* * * * * *
* * *
*

Lossless compression	Lossy compression
Reduces the file size by replacing repeated data with shorter codes.	Reduces the file size by permanently deleting the data without which the file could solve its purpose.
File can be decompressed to its original state without losing any data.	File cannot be regained to its original form one it is compressed.